Search CORE

472 research outputs found

Recompression of SLPs

Author: Jez Artur
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)
Publication date: 01/01/2017
Field of study

In this talk I will survey the recompression technique in case of SLPs. The technique is based on applying simple compression operations (replacement of pairs of two different letters by a new letter and replacement of maximal repetition of a letter by a new symbol) to strings represented by SLPs. To this end we modify the SLPs, so that performing such compression operations on SLPs is possible. For instance, when we want to replace ab in the string and SLP has a production X to aY and the string generated by Y is bw, then we alter the rule of Y so that it generates w and replace Y with bY in all rules. In this way the rule becomes X to abY and so ab can be replaced, similar operations are defined for the right sides of the nonterminals. As a result, we are interested mostly in the SLP representation rather than the string itself and its combinatorial properties. What we need to control, though, is the size of the SLP. With appropriate choices of substrings to be compressed it can be shown that it stays linear. The proposed method turned out to be surprisingly efficient and applicable in various scenarios: for instance it can be used to test the equality of SLPs in time O(n log N), where n is the size of the SLP and N the length of the generated string; on the other hand it can be used to approximate the smallest SLP for a given string, with the approximation ratio O(log(n/g)) where n is the length of the string and g the size of the smallest SLP for this string, matching the best known bounds

Dagstuhl Research Online Publication Server

Word Equations in Nondeterministic Linear Space

Author: Jez Artur
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)
Publication date: 01/01/2017
Field of study

Satisfiability of word equations is an important problem in the intersection of formal languages and algebra: Given two sequences consisting of letters and variables we are to decide whether there is a substitution for the variables that turns this equation into true equality of strings. The computational complexity of this problem remains unknown, with the best lower and upper bounds being, respectively, NP and PSPACE. Recently, the novel technique of recompression was applied to this problem, simplifying the known proofs and lowering the space complexity to (nondeterministic) O(n log n). In this paper we show that satisfiability of word equations is in nondeterministic linear space, thus the language of satisfiable word equations is context-sensitive. We use the known recompression-based algorithm and additionally employ Huffman coding for letters. The proof, however, uses analysis of how the fragments of the equation depend on each other as well as a new strategy for nondeterministic choices of the algorithm, which uses several new ideas to limit the space occupied by the letters

Dagstuhl Research Online Publication Server

Approximation of smallest linear tree grammar

Author: Jez Artur
Lohrey Markus
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)
Publication date: 01/01/2014
Field of study

A simple linear-time algorithm for constructing a linear context-free tree grammar of size O(r^2.g.log(n)) for a given input tree T of size n is presented, where g is the size of a minimal linear context-free tree grammar for T, and r is the maximal rank of symbols in T (which is a constant in many applications). This is the first example of a grammar-based tree compression algorithm with an approximation ratio polynomial in g. The analysis of the algorithm uses an extension of the recompression technique (used in the context of grammar-based string compression) from strings to trees

Dagstuhl Research Online Publication Server

MPG.PuRe

LZ77 Factorisation of Trees

Author: Gawrychowski Pawel
Jez Artur
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2016)
Publication date: 01/01/2016
Field of study

We generalise the fundamental concept of LZ77 factorisation from strings to trees. A tree is represented as a collection of edge-disjoint fragments that either consist of one node or has already occurred earlier (in the BFS order). Similarly as for strings, such a collection uniquely determines the tree, so by minimising the number of fragments we obtain a compressed representation of the tree. We show that our generalisation has several useful properties of the standard LZ77 factorisation: it can be computed in polynomial time and its simpler variant in linear time; its size is not larger than the smallest grammar for a tree; it can be transformed (in linear time) into a tree grammar of size O(rg log(n/(rg))), where n is the size of the tree, g the size of the smallest grammar for this tree and r the maximal arity of the nodes in the tree, which matches a recent bound of Jez and Lohrey [STACS 2014], but with a simpler and more modular proof

Dagstuhl Research Online Publication Server

Sliding Windows over Context-Free Languages

Author: Ganardi Moses
Jez Artur
Lohrey Markus
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)
Publication date: 01/01/2018
Field of study

We study the space complexity of sliding window streaming algorithms that check membership of the window content in a fixed context-free language. For regular languages, this complexity is either constant, logarithmic or linear [Moses Ganardi et al., 2016]. We prove that every context-free language whose sliding window space complexity is log_2(n) - omega(1) must be regular and has constant space complexity. Moreover, for every c in N, c >= 1 we construct a (nondeterministic) context-free language whose sliding window space complexity is O(n^(1/c)) o(n^(1/c)). Finally, we give an example of a deterministic one-counter language whose sliding window space complexity is Theta((log n)^2)

Dagstuhl Research Online Publication Server

Edit Distance with Block Operations

Author: Ganczorz Michal
Gawrychowski Pawel
Jez Artur
Kociumaka Tomasz
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 26th Annual European Symposium on Algorithms (ESA 2018)
Publication date: 01/01/2018
Field of study

We consider the problem of edit distance in which block operations are allowed, i.e. we ask for the minimal number of (block) operations that are needed to transform a string s to t. We give O(log n) approximation algorithms, where n is the total length of the input strings, for the variants of the problem which allow the following sets of operations: block move; block move and block delete; block move and block copy; block move, block copy, and block uncopy. The results still hold if we additionally allow any of the following operations: character insert, character delete, block reversal, or block involution (involution is a generalisation of the reversal). Previously, algorithms only for the first and last variant were known, and they had approximation ratios O(log n log^*n) and O(log n (log^*n)^2), respectively. The edit distance with block moves is equivalent, up to a constant factor, to the common string partition problem, in which we are given two strings s, t and the goal is to partition s into minimal number of parts such that they can be permuted in order to obtain t. Thus we also obtain an O(log n) approximation for this problem (compared to the previous O(log n log^* n)). The results use a simplification of the previously used technique of locally consistent parsing, which groups short substrings of a string into phrases so that similar substrings are guaranteed to be grouped in a similar way. Instead of a sophisticated parsing technique relying on a deterministic coin tossing, we use a simple one based on a partition of the alphabet into two subalphabets. In particular, this lowers the running time from O(n log^* n) to O(n). The new algorithms (for block copy or block delete) use a similar algorithm, but the analysis is based on a specially tuned combinatorial function on sets of numbers

Dagstuhl Research Online Publication Server